An inventory of greenhouse gas emissions due to natural gas pipeline incidents in the United States and Canada from 1980s to 2021

Natural gas is believed to be a critical transitional energy source. However, natural gas pipelines, once failed, will contribute to a large amount of greenhouse gas (GHG) emissions, including methane from uncontrolled natural gas venting and carbon dioxide from flared natural gas. However, the GHG emissions caused by pipeline incidents are not included in the regular inventories, making the counted GHG amount deviate from the reality. This study, for the first time, establishes an inventory framework for GHG emissions including all natural gas pipeline incidents in the two of the largest gas producers and consumers in North America (United States and Canada) from 1980s to 2021. The inventory comprises GHG emissions resulting from gathering and transmission pipeline incidents in a total of 24 states or regions in the United States between 1970 and 2021, local distribution pipeline incidents in 22 states or regions between 1970 and 2021, as well as natural gas pipeline incidents in a total of 7 provinces or regions in Canada between 1979 and 2021. These datasets can improve the accuracy of regular emission inventories by covering more emission sources in the United States and Canada and provide essential information for climate-oriented pipeline integrity management.


Background & Summary
The United States and Canada are the two largest producers and consumers of natural gas in North America 1-3 . Their natural gas pipeline network is extensive and has been developed over six decades [4][5][6] . Natural gas pipelines are categorized into three types, namely gathering, transmission, and local distribution pipelines. These categories exhibit notable distinctions in terms of their functions, materials, operating pressures, and other relevant parameters and factors 7 . The detailed information about the three types of natural gas pipelines is listed in Supplementary Table 1. According to statistics in 2021, the mileage of natural gas pipelines in the United States is about 330,000 km, and that in Canada is about 84,000 km 8 . In response to increasing export demands, the construction of natural gas pipelines in the United States and Canada will continue to increase steadily.
Incidents usually occur on natural gas pipelines due to corrosion, external interference, and other relevant reasons during different stages of pipeline operation, from commissioning to their decommissioning [9][10][11] . The predominant constituent of natural gas is methane. There exists a potential for explosions and consequential casualties in the aftermath of incidents occurring on natural gas pipelines [12][13][14][15] . When an incident does not cause combustion, the methane, which has a heating capacity eighty times greater than carbon dioxide, will directly release into the atmosphere [16][17][18] . Previous studies have analyzed the greenhouse gas (GHG) emissions from natural gas pipelines and equipment, and discussed the impact of installation and operation of natural gas www.nature.com/scientificdata www.nature.com/scientificdata/ transportation infrastructure on the climate [19][20][21][22] . However, relevant inventory works ignored the GHG emissions generated from natural gas pipelines under abnormal conditions (incidents) [23][24][25] . In fact, although incidents occur occasionally and the resulting GHG emissions account for a relatively small proportion of GHG emissions in regular operations, this part cannot be ignored 26,27 . For example, the incident that occurred in September 2022 on the Nord Stream 1 and 2 natural gas pipelines may have resulted in a release of 220,000 metric tons of methane [28][29][30] .
To understand the GHG emissions caused by natural gas pipeline incidents in the United States and Canada, this study develops a GHG emissions inventory of natural gas pipeline incidents in the United States and Canada from 1980s to 2021 using Monte Carlo simulation. The dataset is used to show the total GHG emissions from natural gas pipeline incidents in specific states or provinces at a macro level. The developed dataset defines carbon dioxide emissions from natural gas combustion as well as direct methane emissions. In comparison to previous datasets, the dataset developed in this work (1) quantitatively determines the GHG emissions that were not analysed previously; (2) considers and defines the uncertainty associated with the original datasets using Monte Carlo simulations; (3) consider the substantial amount of missing data in the existing datasets from the United States and Canada, and eliminates the analysis limitation; and (4) while the United States' pre-2010 and Canada's pre-2008 GHG emissions records were not paid sufficient attention, estimates the values for GHG emissions resulting from natural gas pipeline incidents dating back to the 1980s.
To verify the effectiveness of the proposed method, the estimated results are compared with the GHG emissions using the deterministic method. Based on the GHG emission inventory, both industry and governmental regulators can obtain the risks of GHG emissions caused by natural gas pipeline incidents, and develop a climate-oriented pipeline integrity management plan.

Methods
Method for calculating GHG emissions from single-point incidents. After natural gas pipeline incidents, methane may be emitted directly into the atmosphere, or it may enter the atmosphere in the form of carbon dioxide due to combustion or explosion. Both may coexist in the case of partial combustion of natural gas. Additionally, in some incidents, a portion of the residual natural gas remaining in the pipeline needs to be artificially burned for safer maintenance. The aforementioned GHG emissions all belong to the GHG emissions caused by pipeline incidents. Thus, the amount of GHG emissions from a pipeline incident depends primarily on the amount of natural gas released and if the natural gas is combusted by flaring. If the natural gas in the pipeline incident is combusted by flaring, the GHG emissions can be calculated according to Eq. (1) 31 .
where GHG b is the GHG emissions of natural gas burning, MT CO 2 eq.; V 1 is the volume of burned natural gas, one thousand cubic feet (Mcf); AHC is the average heat content of natural gas, at 1.036 metric million British thermal unit per thousand cubic feet (mmbtu/Mcf) 31 ; C c is the average carbon coefficient of natural gas burning, at 0.01443 MT carbon/mmbtu 31 ; f is the fraction of natural gas oxidized into carbon dioxide; C is the molecular weight ratio of carbon dioxide to carbon, at 44/12; and GWP n is the global warming potential, which equals to 1 for carbon dioxide. Thus, Eq. (1) can be simplified as: If the incident does not involve combustion of natural gas, the GHG is directly emitted into the atmosphere in the form of methane. The GHG emissions can be calculated according to Eq. (3) 31 . Herein, the methane emissions can be calculated by the value obtained from Eq. (3) divided by GWP n .
where ρ is the natural gas density, 0.8 kg/m 3 ; V 2 is the volume of natural gas released (without burning) into the atmosphere, Mcf; and GHG ub is the GHG emissions resulting from methane, MT CO 2 eq. As the main component of natural gas is CH 4 , GWP 100 = 27.9 under the 100-year timeframe from the IPCC AR6 WGI report 31 . Therefore, if the release of natural gas in an incident is defined as V, and V = V 1 + V 2 , the GHG emissions in an incident can be expressed as: where δ is the proportion of burned natural gas in the total released amount in an incident. Additionally, the following assumptions are made when calculating the GHG emissions of a single-point incident 31 : 1. The natural gas release in the incident reported by the operator is accurate.
2. If combustion occurs in the incident, there are 96% to 100% of the burned natural gas is oxidized to carbon dioxide.

Methodology for estimating GHG emissions of natural gas pipeline incidents at the state-level (or provincial-level).
Considering the differences in mileages, construction, and management modes of natural gas pipelines in various administrative divisions, it is essential to have an inventory of the GHG emissions caused by natural gas pipeline incidents at the state (or provincial) level. Although some datasets such as PHMSA www.nature.com/scientificdata www.nature.com/scientificdata/ and CER list the release amount and combustion conditions of natural gas in each pipeline incident, the measurement results are approximate and some data are missing (especially for the CER database). Henceforth, this study utilizes the Monte Carlo simulation technique to estimate the GHG emissions (including carbon dioxide and methane emissions) resulting from natural gas pipeline incidents across various states or provinces within a defined range of uncertainty. The primary principle underlying this methodology involves fitting probability density functions (PDFs) for the released quantity of natural gas and the combustion ratios of natural gas by leveraging existing data, followed by generating multiple points based on these PDFs, equivalent in number to the actual number of incidents, thereby enabling the estimation of GHG emissions. The specific calculation process is shown in Fig. 1. The GHG emissions from all incidents in a state or province are: where N t is the total number of incidents in the state or province.
In this study, the PDFs of parameters V and δ required for estimating GHG emissions from pipeline incidents in the United States were generated based on data from Pipeline & Hazardous Materials Safety Administration (PHMSA) spanning from January 2010 to February 2023. Corresponding PDFs for Canada were generated based on data from Canada Energy Regulator (CER) spanning from January 2008 to February 2023. Based on the obtained PDFs, an equal number of parameters V and δ were generated, corresponding to the total number of pipeline incidents that occurred between 1970 and 2021 in the United States. This enabled the estimation of GHG emissions resulting from natural gas pipeline incidents in each state of the United States over this 52-year period. A similar approach was used to estimate GHG emissions for Canada spanning from 1979 to 2021 (43-year period). Note that due to the lack of detailed records regarding the release volume and combustion conditions for each incident that occurred in the United States between 1970 and 2009 and in Canada between 1979 and 2007, only information regarding the number of incidents was available for these periods. Hence, the estimated results were based on the assumption that incident characteristics remained unchanged from the 1980s to the present day. The specific description of PHMSA and CER datasets can be found in the section entitled "Data collection".
The calculation of the proportion of unburned natural gas (methane) in the United States and Canada is slightly different because the CER database only includes the information whether there is combustion in an  Fig. 1 The process for estimating the GHG emissions at the state or provincial level.
incident, and does not include the specific volume of natural gas be burned. Therefore, for Canada, the proportion of methane can only be 0 and 1 under the assumption that natural gas is entirely burned or not burned in an incident. Moreover, if the amount of data in the state or province is too small, a significant deviation may be caused when fitting the PDF. Therefore, the data of all states or provinces with an effective data volume of less than 20 are combined to form a new dataset for PDF fitting, as shown in Supplementary Tables 2  The pipeline incident records in Canada are included in the CER dataset 33,34 . Unlike the PHMSA datasets in the United States, the CER dataset documents incident information for all types of pipelines, including gas and liquid pipelines. The CER provides two datasets: i.e., Dataset A containing pipeline incident information from 1979 to the present, and Dataset B containing incident information from 2008 to the present. The latter provides more detailed records of leaked materials and combustion conditions. Nonetheless, the information on leakage in the CER dataset is incomplete. Specifically, in Dataset B, there are a total of 1,735 incidents from January 2008 to February 2023, where only 529 incidents are recorded with information of leakage. Data loss is more substantial in Dataset A, where only 850 out of 3,199 incidents are recorded with leakage information. Supplementary Table 5 lists the differences between the datasets managed by PHMSA and CER.
Upon analysis of the datasets, both the PHMSA and CER datasets suffer from in assessing the environmental impact resulting from the natural gas pipeline incidents. (1) The PHMSA and CER databases do not contain adequate presentations of the GHG emissions. (2) Records from the United States prior to 2010 and from Canada prior to 2008 did not contain quantitative information about the natural gas release volumes, making it difficult, if not impossible, to estimate GHG emissions associated with these incidents. (3) Despite the availability of Canadian natural gas release records since 2008, the substantial amount of missing data disable an effective assessment of GHG emissions resulting from these incidents.

Data records
In this study, three datasets were generated 35 Tables 1-3. According to the Monte Carlo simulation, Texas has the highest GHG emissions from gathering and transmission pipeline systems due to incidents, amounting to (27.44 ± 2.34) million MT CO 2 eq. In the gas pipeline systems, Michigan has the highest GHG emissions caused by incidents, reaching (3.59 ± 0.47) million MT CO 2 eq. Alberta in Canada has the highest GHG emissions at (36.55 ± 99.76) million MT CO 2 eq. Additionally, Supplementary Tables 6-8 and Supplementary Figs. 1, 2 display the carbon dioxide and methane emissions in each system.

technical Validation
Uncertainties. In this study, the uncertainty in estimating GHG emissions mainly comes from the operators' measurements of natural gas release and determination of the combusted amount in incidents. The Monte Carlo simulation results can obtain the uncertainty of each state (or province or region). The uncertainty range is determined by twice the standard deviation (σ) of 200,000 Monte Carlo simulation results (These data are available at

Province
Average GHG emissions (MT CO 2 eq.) 2σ (MT CO 2 eq.) Uncertainty range   www.nature.com/scientificdata www.nature.com/scientificdata/ Figshare 35 ). Table 1 displays the uncertainty range of GHG emissions from gathering and transmission pipeline incidents in the United States, which varies between ±7% and ±35%, while the uncertainty range for local distribution pipeline incidents lies between ±9% and ±58% (see Table 2). In comparison to the inventory results for the United States, the uncertainty range for Canada's inventory results is larger, ranging from ±42% to ±335% (see Table 3).

Validation of the estimation method.
To verify the reliability of the GHG emission estimation method proposed in this study, the estimation results were compared with the results of the deterministic method. The deterministic method calculates GHG emissions based on the data provided by single-point incidents and then adds up the GHG emissions caused by all incidents within the respective states (These data are available at Figshare 35 ). It is noted that, in calculations of GHG emissions from single-point incidents involving combustion, it is assumed that 98% of natural gas is oxidized to carbon dioxide 36 . Due to the large amount of missing data in the CER dataset, this study used incident data from the United States between 2010 and 2021 for validation. The deviation can be defined as: where λ is the deviation; M is the average GHG emissions from Monte Carlo simulations, MT CO 2 eq.; and D is GHG emissions from the deterministic methods, MT CO 2 eq. Figure 4 illustrates that the GHG emissions estimated using the deterministic approach fall within the range of GHG

Discussions
This study acknowledges the uncertainty in natural gas release during incidents. However, the accuracy of the GHG emissions (including carbon dioxide and methane emissions) inventory may be affected by incidents that were not reported by operators or remained undetected for a prolonged period. Given the extensive network of pipelines in the United States and Canada, it is not unusual to encounter reporting gaps, missing data, and other issues when dealing with large-scale incident statistics. These challenges are particularly difficult to address in local distribution pipeline systems, where the pipeline locations are usually complicated, making it even harder to identify incidents and accurately measure the natural gas release. Although the U.S. Environmental Protection Agency (EPA) has made efforts to investigate GHG emissions from natural gas systems 37 , the managed datasets do not specify whether the emissions resulting from incidents are included in the analysis. This occurs because the GHG emissions resulting from pipeline incidents are relatively insignificant when compared to other sources within the natural gas system, such as combustion of natural gas for powering pumps and other electrical equipment. However, incidents like the North Stream pipeline explosion in September 2022 can generate significant GHG emissions. Therefore, a thorough investigation of GHG emissions resulting from pipeline incidents is meaningful for the climate-oriented integrity management of the pipelines. It is recommended that separate inventories be managed to account for GHG emissions resulting from pipeline incidents.

Code availability
The code utilized for the Monte Carlo simulations in this study is provided in the Supplementary Codes 1-3.